home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Amiga Format CD 42
/
Amiga Format AFCD42 (Issue 126, Aug 1999).iso
/
-serious-
/
programming
/
other
/
jikes
/
src
/
unzip.h
< prev
next >
Wrap
C/C++ Source or Header
|
1999-05-14
|
17KB
|
374 lines
// $Id: unzip.h,v 1.2 1999/01/13 21:48:42 shields Exp $
#ifndef unzip_INCLUDED
#define unzip_INCLUDED
#include "config.h"
//
// NOTE: Jikes incorporates compression code from the Info-ZIP
// group. There are no extra charges or costs due to the use of
// this code, and the original compression sources are freely
// available from http://www.cdrom/com/pub/infozip/ or
// ftp://ftp.cdrom.com/pub/infozip/ on the Internet.
// The sole use by Jikes of this compression code is contained in the
// files unzip.h and unzip.cpp, which are based on Info-ZIP's inflate.c and
// associated header files.
//
//
// You can do whatever you like with this source file, though I would
// prefer that if you modify it and redistribute it that you include
// comments to that effect with your name and the date. Thank you.
// The abbreviated History list below includes the work of the
// following:
// M. Adler, G. Roelofs, J-l. Failly, J. Bush, C. Ghisler, A. Verheijen,
// P. Kienitz, C. Spieler, S. Maxwell, J. Altman
// Only the first and last entries from the original inflate.c are
// reproduced here.
//
//
// History:
// vers date who what
// ---- --------- -------------- ------------------------------------
// a ~~ Feb 92 M. Adler used full (large, one-step) lookup table
// ...
// c16 20 Apr 97 J. Altman added memzero(v[]) in huft_build()
//
#define DFUNZIP /* needed for unzip compilation*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//
// inflate.c -- put in the public domain by Mark Adler
// version c15c, 28 March 1997
//
//
// Inflate deflated (PKZIP's method 8 compressed) data. The compression
// method searches for as much of the current string of bytes (up to a
// length of 258) in the previous 32K bytes. If it doesn't find any
// matches (of at least length 3), it codes the next byte. Otherwise, it
// codes the length of the matched string and its distance backwards from
// the current position. There is a single Huffman code that codes both
// single bytes (called "literals") and match lengths. A second Huffman
// code codes the distance information, which follows a length code. Each
// length or distance code actually represents a base value and a number
// of "extra" (sometimes zero) bits to get to add to the base value. At
// the end of each deflated block is a special end-of-block (EOB) literal/
// length code. The decoding process is basically: get a literal/length
// code; if EOB then done; if a literal, emit the decoded byte; if a
// length then get the distance and emit the referred-to bytes from the
// sliding window of previously emitted data.
// There are (currently) three kinds of inflate blocks: stored, fixed, and
// dynamic. The compressor outputs a chunk of data at a time and decides
// which method to use on a chunk-by-chunk basis. A chunk might typically
// be 32K to 64K, uncompressed. If the chunk is uncompressible, then the
// "stored" method is used. In this case, the bytes are simply stored as
// is, eight bits per byte, with none of the above coding. The bytes are
// preceded by a count, since there is no longer an EOB code.
// If the data are compressible, then either the fixed or dynamic methods
// are used. In the dynamic method, the compressed data are preceded by
// an encoding of the literal/length and distance Huffman codes that are
// to be used to decode this block. The representation is itself Huffman
// coded, and so is preceded by a description of that code. These code
// descriptions take up a little space, and so for small blocks, there is
// a predefined set of codes, called the fixed codes. The fixed method is
// used if the block ends up smaller that way (usually for quite small
// chunks); otherwise the dynamic method is used. In the latter case, the
// codes are customized to the probabilities in the current block and so
// can code it much better than the pre-determined fixed codes can.
// The Huffman codes themselves are decoded using a multi-level table
// lookup, in order to maximize the speed of decoding plus the speed of
// building the decoding tables. See the comments below that precede the
// lbits and dbits tuning parameters.
// GRR: return values(?)
// 0 OK
// 1 incomplete table
// 2 bad input
// 3 not enough memory
//
//
// If BMAX needs to be larger than 16, then h and x[] should be unsigned long.
//
#define BMAX 16 /* maximum bit length of any code (16 for explode) */
#define N_MAX 288 /* maximum number of codes in any set */
//
// Notes beyond the 1.93a appnote.txt:
// 1. Distance pointers never point before the beginning of the output
// stream.
// 2. Distance pointers can point back across blocks, up to 32k away.
// 3. There is an implied maximum of 7 bits for the bit length table and
// 15 bits for the actual data.
// 4. If only one code exists, then it is encoded using one bit. (Zero
// would be more efficient, but perhaps a little confusing.) If two
// codes exist, they are coded using one bit each (0 and 1).
// 5. There is no way of sending zero distance codes--a dummy must be
// sent if there are none. (History: a pre 2.0 version of PKZIP would
// store blocks with no distance codes, but this was discovered to be
// too harsh a criterion.) Valid only for 1.93a. 2.04c does allow
// zero distance codes, which is sent as one code of zero bits in
// length.
// 6. There are up to 286 literal/length codes. Code 256 represents the
// end-of-block. Note however that the static length tree defines
// 288 codes just to fill out the Huffman codes. Codes 286 and 287
// cannot be used though, since there is no length base or extra bits
// defined for them. Similarily, there are up to 30 distance codes.
// However, static trees define 32 codes (all 5 bits) to fill out the
// Huffman codes, but the last two had better not show up in the data.
// 7. Unzip can check dynamic Huffman blocks for complete code sets.
// The exception is that a single code would not be complete (see #4).
// 8. The five bits following the block type is really the number of
// literal codes sent minus 257.
// 9. Length codes 8,16,16 are interpreted as 13 length codes of 8 bits
// (1+6+6). Therefore, to output three times the length, you output
// three codes (1+1+1), whereas to output four times the same length,
// you only need two codes (1+3). Hmm.
//10. In the tree reconstruction algorithm, Code = Code + Increment
// only if BitLength(i) is not zero. (Pretty obvious.)
//11. Correction: 4 Bits: # of Bit Length codes - 4 (4 - 19)
//12. Note: length code 284 can represent 227-258, but length code 285
// really is 258. The last length deserves its own, short code
// since it gets used a lot in very redundant files. The length
// 258 is special since 258 - 3 (the min match length) is 255.
//13. The literal/length and distance code bit lengths are read as a
// single stream of lengths. It is possible (and advantageous) for
// a repeat code (16, 17, or 18) to go across the boundary between
// the two sets of lengths.
//
#define PKZIP_BUG_WORKAROUND /* PKZIP 1.93a problem--live with it */
//
// inflate.h must supply the unsigned char slide[WSIZE] array, the zvoid typedef
// (void if (void *) is accepted, else char) and the NEXTBYTE,
// FLUSH() and memzero macros. If the window size is not 32K, it
// should also define WSIZE. If INFMOD is defined, it can include
// compiled functions to support the NEXTBYTE and/or FLUSH() macros.
// There are defaults for NEXTBYTE and FLUSH() below for use as
// examples of what those functions need to do. Normally, you would
// also want FLUSH() to compute a crc on the data.
// This module uses the external functions malloc() and free() (and
// probably memset() or bzero() in the memzero() macro). Their
// prototypes are normally found in <string.h> and <stdlib.h>.
//
/* #define DEBUG */
#ifndef WSIZE /* default is 32K */
# define WSIZE 0x8000 /* window size--must be a power of two, and at least */
#endif